Model Selection

English Visual Interaction

# English Visual Interaction

Qwen2 VL 2B Instruct GGUF

Qwen2-VL-2B-Instruct is a multimodal vision-language model that supports image-text generation tasks, based on the Qwen2 architecture with a parameter scale of 2B.

Image-to-Text English

Florence 2 VLM Doc VQA

A specialized version for Visual Question Answering (VQA) fine-tuned based on microsoft/Florence-2-base-ft, capable of interpreting image content and answering related questions

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase